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. . . Abstract 

(N 

We consider the interference management problem in a multicell MIMO heterogeneous network. Within each 

cell there is a large number of distributed micro/pico base stations (BSs) that can be potentially coordinated for joint 

transmission. To reduce coordination overhead, we consider user-centric BS clustering so that each user is served by 

J> . only a small number of (potentially overlapping) BSs. Thus, given the channel state information, our objective is to 

jointly design the BS clustering and the linear beamformers for all BSs in the network. In this paper, we formulate 

^^ . this problem from a sparse optimization perspective, and propose an efficient algorithm that is based on iteratively 

qq | solving a sequence of group LASSO problems. A novel feature of the proposed algorithm is that it performs BS 

£SJ . clustering and beamformer design jointly rather than separately as is done in the existing approaches for partial 

coordinated transmission. Moreover, the cluster size can be controlled by adjusting a single penalty parameter in 

the nonsmooth regularized utility function. The convergence of the proposed algorithm (to a stationary solution) is 

guaranteed, and its effectiveness is demonstrated via extensive simulation. 

O 

I. Introduction 

The design of future wireless cellular networks is on the verge of a major paradigm change. In order to 
accommodate the explosive demand for wireless data, the traditional wireless network architecture comprised of 



X 



> 

o 

Q\ • a small number of high power base stations (BSs) has started to migrate to the so-called heterogeneous network 

CO ', (HetNet) ID, (21. In HetNet, each cell is composed of potentially a large number of densely deployed access nodes 
such as macro/micro/pico BSs to provide coverage extension for cell edge and hotspot users 0. Unfortunately, close 

CJH . proximity of many transmitters and receivers introduces substantial interference, which, if not properly managed, 

pg , can significantly affect the system performance. 

The interference management problem in multicell downlink networks has been a topic of intensive research 
recently. It has been widely accepted that combining physical layer techniques such as multiple input multiple output 
(MIMO) antenna arrays with multi-cell coordination can effectively mitigate inter-cell and intra-cell interference 
(U-Q. There are two main approaches for the coordinated transmission and reception in a multi-cell MIMO 
C3 ■ network: joint processing (JP) and coordinated beamforming (CB) Q. In the first approach, the user data signals 
are shared among the cooperating BSs. A single virtual BS can then be formed that transmits to all the users in 
the system. Inter-BS interference is canceled by joint precoding and transmission among all the coordinated BSs. 
In this case, either the capacity achieving non-linear dirty-paper coding (DPC) (see, e.g., B, Q), or simpler linear 
precoding schemes such as zero-forcing (ZF) (see, e.g., ll8l- lfT0ll ) can be used for joint transmission. However, 
centralized processing is needed for the computation of the beamformers. Furthermore, this approach can require 
heavy signaling overhead on the backhaul network (0, |[TTI . lfl2l ) especially when the number of cooperating BSs 
in the network becomes large. 

When the benefit of full JP among the BSs is outweighed by the overhead, the BSs can choose CB as an 
alternative reduced coordination scheme. In particular, the beamformers are jointly optimized among the coordinated 
BSs to suppress excessive inter-BS interference. In this case, only local channel state information (CSI) and control 
information are exchanged among the coordinated BSs. One popular formulation for CB is to optimize the system 
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performance measured by a certain system utility function. Unfortunately, optimally solving the utility maximization 
problem in MIMO interfering network is computationally intractable in general (except for a few exceptions, see 
lfT3l - |fT6l ). As a result, many works are devoted to finding high quality locally optimal solutions for different 
network configurations, e.g., in MIMO/MISO interference channels (IC), HTl-lEQ and MIMO/MISO interfering 
broadcast channels (IBC) E21 - E41 . In particular, reference |[24l proposed a weighted Minimum Mean Square Error 
(WMMSE) algorithm that is able to compute locally optimal solutions for a broad class of system utility functions 
and for general network configurations. 

A different approach for limited coordination is to group the BSs into coordination clusters of small sizes, 
within which they perform JP. In this case, each user's data signals are only shared among a small number of its 
serving BSs, thus greatly reducing the overall backhaul signaling cost. Many recent works have developed various 
BS clustering strategies for such purpose, e.g., ||9), ll25l - |[3Ti . where clusters are formed either greedily or by an 
exhaustive search procedure. Once the clusters are formed, various approaches can be used to design beamforming 
strategies for each BS. For example, the authors of 11281 - 11301 utilized the ZF strategy for intra-cluster transmission 
without assuming any inter-cluster cooperation. References O, IT251 considered a hybrid cooperation strategy in 
which CB is used for inter-cluster coordination. In this way, inter-cluster interference for cluster edge users is 
also mitigated. In principle, clustering strategies should be designed in conjunction with the beamforming and BS 
coordination strategies to strike the best tradeoff among system throughput performance and signalling overhead. 

In this work, we consider the joint BS clustering and beamformer design problem in a downlink multicell 
HetNet for general partial coordinated transmission. In our formulation, the BSs that belong to the same cell can 
dynamically form (possibly overlapping) coordination clusters of small sizes for JP while the BSs in different cells 
perform CB. We formulate this problem from the perspective of sparse optimization. Specifically, if all the BSs 
that belong to the same cell are viewed as a single virtual BS, then its antennas can be partitioned into multiple 
groups (each corresponding to an individual BS). Moreover, the requirement that each user is served by a small 
number of BSs translates directly to the restriction that its virtual beamformer should have a group sparse structure, 
that is, the nonzero components of the virtual beamformer should correspond to only a small number of antenna 
groups. This interpretation inspires us to formulate a system utility maximization problem with a mixed ^2/^1 
regularization, as it is well known that such regularization induces the group sparse structure [32 1. Incorporating 
such nonsmooth regularization term into our objective ensures that the optimal beamformers possess the desired 
group-sparse structure. In this way, our proposed approach can be viewed as a single-stage formulation of the joint 
BS grouping and beamforming problem. The main contributions of this work are listed as follows. 

• We propose to jointly optimize the coordination clusters and linear beamformers in a large scale HetNet by 
solving a single-stage nonsmooth utility optimization problem. The system utility function (without nonsmooth 
penalization) can have a very general form that includes the popular weighted sum rate and proportional fair 
utility functions. This approach is different from the existing algorithms, which either require a predefined BS 
clustering and a fixed system utility function, or some multi-stage heuristic optimization. 

• Since the resulting nonsmooth utility maximization problem is difficult to solve due to its nonconvexity as well 
as its nonsmoothness, we transform this problem to an equivalent regularized weighted MSE minimization 
problem. The latter has several desirable features such as separability across the cells and convexity among 
different blocks of variables. This equivalence transformation substantially generalizes our previous result in 
11241 . which only deals with smooth utility functions. 

• We propose an efficient iterative algorithm that computes a stationary solution to the transformed problem. 
In each step of the algorithm the computation is closed-form and can be distributed to individual cells. The 
algorithm is shown to converge to a stationary solution to the original nonsmooth utility maximization problem, 
and its effectiveness is demonstrated via extensive simulation experiments. 

The rest of the paper is organized as follows. In Section [EI] we present the system model and formulate the problem 
into a nonsmooth utility maximization problem. We then transform this problem into an equivalent regularized 
weighted MSE minimization problem in Section |III] In Section [TV] an efficient algorithm is proposed to solve the 
transformed problem. In Section [V] numerical examples are provided to validate the proposed algorithm. 

Notations: For a symmetric matrix X, X >z signifies that X is positive semi-defmite. We use Tr(X), |X|, X^ 
and /o(X) to denote the trace, determinant, hermitian and spectral radius of a matrix, respectively. For a complex 
scalar x, its complex conjugate is denoted by x. For a vector x, we use ||x|| to denote its £2 norm. I n is used to 
denote a n x n identity matrix. We use [y,x_j] to denote a vector x with its ith element replaced by y. We use 



•^NxM an( j qNxM to (jenQte the se t f rea i an( j complex TV x M matrices; We use S^ and 8+ to denote the set of 
N x N hermitian and hermitian positive semi-definite matrices, respectively. We use the expression: < a _L b > 
to indicate a > 0, b > 0, a x b = 0. 

II. System Model and Problem Formulation 

Consider a downlink multi-cell HetNet consisting of a set /C = {1, • • • ,K} of cells. Within each cell k there 
is a set of Qf. = {1, • • • ,Qk} distributed base stations (BS) (for instance, macro/micro/pico BSs) which provide 
service to users located in different areas of the cell. Assume that in each cell k, there is low-latency backhaul 
network connecting the set of BSs Q^ to a central controller (usually the macro BS), and that the central controller 
makes the resource allocation decisions for all BSs within the cell. Furthermore, this central entity has access to 
the data signals of all the users in its cell. Let Z& = {1, • • • ,1^} denote the users located in cell k. Each of the 
users ik G Zfc is served jointly by a subset of BSs in Qk- Let X denote the set of all the users. For simplicity of 
notations, let us assume that each BS has M transmit antennas, and each user has N receive antennas. Throughout 
the paper, we use i,j to indicate the user index, use k,£ for the cell index, and use q,p for the BS index. Let 



jjqi g qNxM denote (-h e channel matrix between the oth BS in the Ah cell and the ith user in the kth cell. Let 
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H, 1 , • • • , H? e G £NxMQ e denote the channel matrix between all the BSs in the ith cell to the user i k . 

q k ,_ o-A/xl 



Let vj* G C denote the transmit beamformer that BS qk uses to transmit a single stream of data signal 

H 



Si k G C to user i^. Define Vj fc = 



(v* )•**,-•• , (v^ fc ) ii G C AI ® kXl as the collection of all beamformers intended 
for user i^. Let v = [vy , • • • , vf] . Assume that there is a power budget constraint for each BS qk, i.e., 

£ (v£ )"v£ < P qk , V qk e Q k , V k e K. (1) 

Let x qk G C A/Xl denote the transmitted signal of BS q k , and let x fc = [(x 1 )^, • • • , (x Qk ) H ] H G C A/QfcXl denote 
the collection of transmitted signals of all the BSs in cell k, i.e. 

* qk = E v *X> x "= E v ^- 

ikGXk ik£lk 

The received signal yj fc G C Nxl of user ij, is 



y. = E H t* 






H t v ^ s H + E H t v ^ s ^ +E E H i v h s h +^ (2) 



intra-cell interference inter-cell interference 

where Zj fc G C Nxl is the additive white Gaussian noise with distribution CJ\f(0, af In). 



Let Uj fc G C Nxl denote the receive beamformer used by user i^ to decode the intended signal. Then the estimated 
signal for user i% is: Sj fc = ufyi k . The mean square error (MSE) for user i k can be written as 



E [(* fc -aiJ(*i* -«i J] 



= (l-<H*vO(l-ugHj k v lfc ) 

+ E < H t v ;< v f( H t)^+4<^- (3) 

(«)tHM 

The MMSE receiver minimizes user i^'s MSE, and can be expressed as 

u™= [E H l v «(^) ff ( H iJ H + <I] H^v 4fc 
V(O') / 

where Cj fc denotes user i^'s received signal covariance matrix. The minimum MSE for user i k when the MSE 



receiver is used can be expressed as 

eJ^" = l-(v ifc ) ff (H i \) ff C i -X v **- (5) 

Clearly, we have 1 — e? 111180 > 0. Let us assume that Gaussian signaling is used and the interference is treated 
as noise. If we assume that all the BSs in cell k form a single virtual BS, then Vj fc can be viewed as the virtual 
beamformer for user i^. The achievable rate for user i^ is given by |J33l 



Ri k = log 



rfe ^ 



(6) 



£ Hlv,v-(HO ff + 4l-) 



The above expression suggests that each user can always use a MMSE receiver © since it preserves achievable 
data rate when the interference is treated simply as noise. We will occasionally use the notations Ri k (v), Cj fc (v) 
to make their dependencies on v explicit. 

Notice that the rate © can only be achieved when all the BSs in cell k perform a full JP. Unfortunately, this 
requires the data signal for user i^ to be known at all BSs in Qk, causing significant signaling overhead, especially 
when the number of users and BSs becomes large. To reduce overhead, partial cooperative transmission is preferred 
whereby each user is served by not all, but a subset, of BSs in the cell. Mathematically, we are interested in jointly 
performing the following two tasks: i) for each user if,, identify a small subset of serving BSs S-i k C Qk such 
that for each q^ £ Si k , v? fc = 0; ii) optimize the transmit beamformers {v?*}^^ ,i k ei k to achieve high system 
throughput and/or fairness level. With the partial JP, user i^s data signal needs to be shared only among BSs in 
Si k , rather than among all BSs in Q^. 

The requirement that \Si k \ is small translates to the restriction that Vj fc should contain only a few nonzero block 
components (i.e., most of the beamformers {v? fc } gfcg g fc should be set to zero). This structure of the beamformer 
Vj fc is referred to as the group sparse structure [32]. Recovering group sparse solutions for optimization problem 
has recently found its application in many fields such as machine learning 11341 , microarray data analysis ||35l , 
signal processing ll36l . 11371 and communications II381 . One popular approach to enforce the sparsity of the solution 
to an optimization problem is to penalize the objective function with a group-sparse encouraging penalty such as 
the mixed ^2/^1 norm [321. In our case, such norm can be expressed as: ^ e g Il v f°ll> which i s the i\ norm of 



the vector consists of £2 norms {||v^ fc ||}g & eQ fc . The resulting penalized problem is usually referred to as the group 



■iqk&Qk » ik 

>i k \\fq k f~ 

least-absolute shrinkage and selection operator (LASSO) problem. 

With the goal of inducing group-sparse structure of the beamformers {vj fc }/ fc ex as well as optimizing system-level 
performance, we propose to design the linear transmit beamformers by solving the following problem 



( pi ) °«E E K(^)-A fe e Ki (7) 

{v *k' kGK.i k £Xk \ qkEQk / 

s -t- E «')V: ^ p «*> v 9k e Q k , V k g K 

where u» k (-) denotes user z^'s utility function, and {A^ > 0}keK i s the set of parameters that control the level 
of sparsity within each cell. Penalizing the objective with the nonsmooth mixed ^2/^1 norm induces the group 
sparsity of v^. To see this, note that the group sparsity of v^ can be characterized by the sparsity of the vector 
{\WT \\}qk&Qk> an d the i\ norm of this vector, which is what we use in ©, is a good approximation of the £q norm 
of it (defined as the number of nonzero entries of the vector); see the literature on compressive sensing ||39l . The 
reference |02| contains more discussion on using the mixed £2/^1 norm to recover the group sparsity. 

Unfortunately, solving (PI) directly is challenging. One reason is that when Afc = 0, Vfe, (PI) becomes a sum 
utility maximization problem for an interfering broadcast channel, which is proven to be NP-hard for many common 
utility functions (see lfT3"l - |[T6l ). Another reason is that most existing algorithms for solving the nonsmooth group 
LASSO problem such as 11321 , ||36l , ||40l , iPTO only work for the case that the smooth part of the objective is 
convex and quadratic. We will provide in the next section an equivalent reformulation of this problem in which 
these difficulties are circumvented. The reformulated problem can be solved (to a stationary solution) via solving 
a series of convex problems. 



III. Equivalent Formulation 

In this section, we develop a general equivalence relationship between the utility maximization problem (PI) 
and a regularized weighted MSE minimization problem. This result is a generalization of a recent equivalence 
relationship developed in ll24l to the nonsmooth setting. The proofs of the results in this section can be found in 
Appendix |A) 

A. Single User Per Cell with Sum Rate Utility 

For ease of presentation, we first consider a simpler case in which there is a single user in each cell. This scenario 
is of interest when different mobiles in each cell are scheduled to orthogonal time/frequency resources, and we 
consider one of such resources. We also focus on using the sum rate utility function. Generalizations to multiple 
users per-cell case with more general utility functions will be given in the next subsection. 

Now that there is a single user in each cell, we denote the user in kth cell as user k. We use v^ fc and Hf to 
denote the BS q^s beamformer for user k, and the channel from BS q% to user k, respectively. Define R^, e^, V&, 
H| and Ufc similarly. Using the sum rate as the system utility function, the sparse beamforming problem for this 
network configuration is given as 

maxWi? fc -A fe £ ||vf||) (8) 

{V X ' kEK V q k £Q k J 

s.t. (vl k ) H vf <P qk ,Vq k eQ k ,Vke K. 

Let us introduce a set of new weight variables {wk}keJC- Consider the following regularized weighted MSE 
minimization problem 



, , fcl ? iD , t , Yl [ Wkek ~ l0g(Wfc) + Xk J2 K" II ) 



(9) 

k<£lC v q k <£Qk 

s.t. (vl k ) H v q k "<P qk , Vq k eQ kl Vfce/C 

One immediate observation is that fixing v, u and solving for w admits a closed form solution: Wk = — , V /c. 
Such property will be used in the following to derive the equivalence relationship between problems d8j and d9j. 
We refer the readers to 1124] Section II.A] for a simple example that motivates this equivalence in the case A& = 0. 

To formally derive the equivalence relationship, the following definitions (see (42)) of stationary points of a 
nonsmooth function are needed. Note that stationarity is a necessary condition for both global and local optimality. 
Let x = [xi, • • • ,xk] be a vector of variables, in which x& £ C^. Let /(•) : C^ k NkXl — y R be a real valued 
(possibly nonsmooth) continuous function. 

Definition 1: x* is a stationary point of the problem min/(x) ifx* G dom(/) and /'(x*;d) > 0, V d, where 
/'(x*;d) is the directional derivative of /(•) at x* in the direction d 

/'(**; d) = liminf[/(x* + Ad) - /(x*)]/A. 
Definition 2: x* is a coordinatewise stationary point of min /(x) ifx.* £ dom(/) and 

/(x* + [0,---,0,d fc ,0,---,0])>/(x*), Vd fe eC^, 
Vx* + [0,--- ,0,d fc ,0,-.. ,0]£dom(/), Vfc = l,--- ,K 

where [0, • • • , 0, d&, 0, • • • ,0] denotes a vector with all zero components except for its kth block. 

Definition 3: The function /(•) is regular at x* £ dom(/) if f(x*; (0, ■ ■ ■ ,0,d fc ,0, ••• ,0)) > 0, V d fe £ 
C N \ V k = 1, • • • , K implies /'(x*, d) > 0, V d = [di, • • • , d K \ 

We establish the equivalence between problems © and © in the following proposition. 

Proposition 1: If (v*,u*,w*) is a stationary solution to problem ©, then v* must be a stationary solution 
to problem ([8]). Conversely, if v* is a stationary solution to problem ([8]), then the tuple (v* , u* , w* ) must be a 
stationary solution to problem d9]), where 

u* fe = C£V)HJta 



j k v v ; i - L fc v fc> w k 



-i 



= (1 - (vD^^q^v*^) , V k e K 
with C fc (v*)^H£vKv^(H£)"+^I. 



eei 



Moreover, the global optimal solutions v* for these two problems are identical. 

Notice that \y* k and w^ introduced in Proposition \T\ are the MMSE receiver and the inverse MMSE corresponding 
to the transmit beamformer v* (cf. dU) and ([5])) respectively. 

B. Multiple Users Per Cell with More General Utility 

In this section, we generalize the equivalence relationship presented in the previous section to the case with more 
general utility function and multiple users per cell. 

Consider the utility function ui k {Ri k ) that satisfies the following two conditions: 

CI) Ui k (x) is concave and strictly increasing in x; 

C2) Ui k (— log(x)) is strictly convex in x for all x satisfying 1 > x > 0. 

Note that this family of utility functions includes several well known utilities such as weighted sum rate and 
geometric mean of one plus rates ll24l . Let {wi k }i ke x be a set of real-valued weights. Let ji k (-) : R — > R denote the 
inverse function of the derivative " lfc de ° g k — . Consider the following regularized weighted MSE minimization 
problem 



(P2) min V( J2 ( m ^e Jfc -u^(-log(T ifc KJ)) 

{ v ^},{u Zk },{ Wtk } kelc \ ikeXk \ 

- w ik ^ ik (w lk ) + X k Y^ ll v ^l 

q k eQ k 

e^ defined in (0. 

Similar to Proposition [Q we can establish the following equivalence relationship. 

Proposition 2: Suppose for each zj. G 2^, f/ie utility function U{ k (•) satisfies the conditions Cl)—C2). If (v* , u* , w*) 
is a stationary solution to problem (P2), f/ierc v* raws? be a stationary solution to problem (PI). Conversely, if 
v* is a stationary solution to problem (PI), then the tuple (v*,u*,w*) must be a stationary solution to problem 
(P2), where 



u* =Cr 1 (v*)H, A; v* 
i k it. \ j i k i 



l k \ ' *k l k ' 



<(l-K)^ ( M) ff Cr 1 ( v*)Htv* b 



with a* A*^(«h) 



dfli* 



>o. 



Moreover, the global optimal solutions v* /or f/iese two problems are identical. 

We remark that u* is again the MSE receiver corresponding to v*; w* takes a similar form to that in the 
statement of Proposition [Q except for the inclusion of a positive weight a* . The positivity of a* comes form the 
assumption CI). 

IV. Joint Clustering and Beamformer Design 

In this section, we will develop an efficient iterative algorithm for the general nonsmooth utility maximization 
problem (PI). Due to the equivalence of this problem and the regularized weighted sum-MSE minimization problem 
(P2), we can focus on solving the latter. We will employ the block coordinate descent (BCD) method ll42l for such 
purpose. 

A. The Algorithm 

It is straightforward to verify that the objective of problem (P2) is convex w.r.t. each variable v, u, w. When 
v, w are fixed, the optimal u* is the MMSE receiver u* = C^H^ Vj fc , V ik £ I. When v, u are fixed, the optimal 
w* takes the following form 



du ik (R ik ) 



dR ik 



x — > 0, V i k €l. 
=RiJv) e i« 



The positivity of w* comes from the fact that e ik > and the utility function u ik {-) is strictly increasing w.r.t. user 
ifc's rate. 



The main part of the algorithm is to find the optimal transmit beamformers v when u, w are fixed. Observe that 
when fixing u, w, problem (P2) can be decomposed into K independent convex problems (one for each cell) 



(P3) min £ <(E^(H*)"<u^)v lfc 

-w ik v? k (Hl) H u ik -w ik u? k Hlv ik +\ k J2 \W?j) 



q k eQ k 



s.t. X)«)*v£<P gt) V ft£ Q, 

Let us focus on solving one of such problems. Note that the constraint set of this problem is separable among 
the beamformers of different BSs. This suggests that we can obtain its optimal solution again by a BCD method, 
with {v? fc }j fc6 i fc as one block of variables. In particular, we will solve (P3) by sequentially solving the following 
problem for each block {^vf k }i k ex k 



(P4) min E < ( E »* ( H £ ) H <^ H £ ) v * 

{V i k ' l k^k ikGlk \ jie l 



-w ikV ? k (Kl)«u ik -w ik ugKlv ik + \ k \\vP 

s.t. 5>£) fi r ^:<p qk . 

This problem is a quadratically constrained group-LASSO problem. The presence of the additional sum power 
constraint prevents the direct application of the algorithms (e.g., ||32"1 . ll36l . BUI . fiTI ) for conventional unconstrained 
group-LASSO problem. In the following we will derive a customized algorithm for solving this problem. 
Define the following two sets of variables 

J fe ^ J2 «, A (Hj;)*u A u£HS G §| /q * (10) 

hex 
d lk 4 ,,(HM\ t G c M ^ xl , V * fc G I fc . (11) 



Partition J^ and dj fc into the following form 



J*[l,l] 



1 1 



d,, = [d«[l],-..,d^[Q fc ]] ff (12) 

where 3 k [q,p] G C MxM , V (g,p) G Q fc x Q fc , and d ifc [g] G C Mxl ,V g G Q k . Utilizing these definitions, the 
objective of problem (P4) reduces to 

E «Jkv, lk - v*d lk - d? k v ik ) + \ k J2 ii < 



l>k I 



i k ez k 

= E /<*(vij+A* E ii v 



i k £l k 



i k £X k 



i k £Z k 



The gradient of the smooth function fi k (vj fc ) w.r.t. v^ fc can be expressed as 



V v «* & (v lfc ) = 2 J fe [g, g]v£ + E J* [«- P]< " d ^ M 



P#<? 



where we have defined 



4 2(j fe [g, g K*-c, 



P#9 



(13) 

(14) 

(15) 



Note that the gradient ([T3T ) given above is coupled with other block variables {^ k }i k ex k , Pk ^ <7fc through the term 
The first order optimality condition for the convex problem (P4) is 



-2\v%n* + (3 k [q,q]v%-c ik )\ GA fc a(||v^||), Vue4 (16) 

< tf* J_ (F gfc - J] « fe ) H vf: ) > 0, (17) 



«* elk 



where |U 9 * is the Lagrangian multiplier for BS q^'s power budget constraint; <9(||v^ fc ||) represents the subdifferential 
of the nonsmooth function || • || at the point v qk . The latter can be expressed as follows (see ||32l , ||4"3l ) 



r 1k 



V; 



" ^0, 



m<\\) = { iw ik7 ~^ (is) 

{x|||x||<l}, v»=0. 

Finding the global optimal solution of problem (P4) amounts to finding the optimal primal dual pair {(v^ fc )*}i k ^x k , (l l9k )* 
that satisfy the conditions (fT6b — (TTtT)- In the following, we will first develop a procedure to find {^vf k }i k <=i k that 
satisfy the conditions (fT6l ) for a given fi qk > 0. Then we will use a bisection method to search for the optimal 
multiplier. 

Step 1) Utilizing the expression for the subdifferential in (TT8V the optimality condition ( fT6l ) can be rewritten as 



A 



wt = 0, if c ik < -f , (19) 



»k 



2 



-l 



wf k = ( J fe [q, q] + I —£±- + ft*. \ l M \ Cik } otherwise (20) 

with 5l k > defined as i = ||v?*||. Note that ( fT9l ) is the key to achieve sparsity, as whenever ||cjJ| is less than 

i k 

the threshold Afc/2, v? fc will be forced to 0. The correctness of (1201 can be checked by plugging the second part 
of CH) into OS). 



By definition, the optimal 6f k must satisfy 



A fc <5f 



A/ 



(21) 



Define the set of active users for BS % as v 4 9fc = {ik\ik GZfc,||cj fc || > ^ }, and define its cardinality as \A qk \ =A qk . 
For any given [i Qk > 0, let us denote a beamformer v^ fc that satisfies dT9t-(t2Qb as -v Qk (n qk ), and the corresponding 
Sf k that satisfies (|2TT > as 6f k (fj, qk ). Clearly for a user ik that satisfies the condition ([T9l ) (i.e., zj. G X*. \ ^4 9fc ), 
v? fc (/i 9fc ) does not depend on /i 9fc and can be directly computed. Let us then focus on the active user i^ G A qh . 
For any if. G .A 9 **, finding a vf k (/i qk ) amounts to obtaining the corresponding 5l k (fi qk ) that satisfies (f2TT >. Due to a 
certain monotonicity property of the function hi k (5' lk , fi qk ) w.r.t. 5? fc , a bisection search on 5f k can be used to find 
5'- k (fj 1 qk ). This claim is established in Appendix IB1 Once 5l k (fi qk ) is found for all i k G A qk , we can use (1201 ) to 
findv*(/i*). 

Step 2) Once {~v qk (^ Qk )}i k ei k i s obtained, we need to search for the optimal \x qk that satisfies the feasibility 
and the complementarity condition (fTTT ). The following result (the proof of which can be found in Appendix ICt 
suggests that there must exist a jl qk > such that the optimal multiplier must lie in [0,/2 9fc ]. Moreover, we can 
perform a bisection search to find the optimal multiplier. 

Lemma 1: For any set of {v qk (fi qk )} that satisfies (|19l )- (|20| ), ||v^ fc (/i' 7 ' i )|| is strictly decreasing w.r.t. fi qk . Moreover, 
there exists a Jl qk such that for all fi qk > JL qk , Y.i k ei k 11^(^)11 < P q k - 

Performing Step 1) and Step 2) iteratively, we can find the desired optimal primal-dual pair for problem (P4). 
Table U summarizes the above BCD procedure. 

It is worth noting that the algorithm in Table U admits a particular simple form (without performing the bisection 
steps) when there is a single user in each cell, and each BS has a single antenna. Let us again denote the only user 
in the /cth cell as user k. The procedure to find BS q^s scalar beamformer vf" £ C for user k (i.e., Step S4-S13 in 
Table U) can be simplified as follows. Utilizing (fT9l)-d20l). if \ck\ < 4f , v q k k = 0. Otherwise, note that J k [q, q] £ R + 



TABLE I 
The Procedure for Solving Problem (P3) 



51) Initialization Generate a feasible set of beamformers {^1 k }q k eQ k .i k ei k 

52) Compute Jfc and dj fc using dTOb and (fTTT i 

53) Repeat Cyclically pick a BS q k G Q k 

54) Compute Ci k using dT3T > for each i k € Ik 



S5) 

S6) 

S7) 

S8) 

S9) 

S10) 

Sll) 

S12) 

S13) 



< A fc set vf h = 



If 2||ci 

Else, choose [i qk and Jl qk such that (/j ?fc )* G \pfl k , /Z 9 
Repeat /i«*~^ (^ +p lk )/2 



rlk, 



For each it G Ifc, choose 5' fc and <L such that 5 < J k (u qk ) G [<$f°, <$,-, 1 
Repeat (for each i k el k ) $1* 



c 



C + OA 

Jk[?,<7] + (-V-+iU ft )lAf 

If h ih {8f h ,n^) < 1, £?* <- 5*; Otherwise, 3?* <- 
Until |<j£ -£*| <e 

If E lfc6 i fc (^ < P«., r k <- M 9fc ; Otherwise, £» 4- M to 
Until l/I* -/x«*| < e 

End If 



J fc [<?,g] + ( : 



/z«*)I 



M 



S14) Until Desired stopping criteria is met 



in this case, we have 



,ik _ 



('k 



Jk[q,q] 



Xk6l 



with <5f 



<* 



Jfc [<7,9] 



A fc <5? 



1. 



/'" 



Consequently, we obtain a closed-form expression for 5 q ^\ St" 



obtain 



J fc [g,g]+M" fc 



,.<ik 



\Ck\~\ 



Ck 



(22) 



Substituting this 8f into (|20l . we 



fc J fe [g,g]+^ |c fc | 

where the multiplier /i 9fc should be chosen such that the condition ([171 ) is satisfied. In summary, we have the 
following closed-form solution for updating v*t 



0, 



„9fc _ ) |cfc| — f c fc 
"fc ~ "> J fc [g,o] Icfel ' 






Cfc 

|c fc |' 



'P, 



'//.■ 



Jfc [9.9] 
otherwise. 



< 



^ICfcl > 



(23) 



The complete algorithm for solving the regularized weighted MSE minimization problem (P2) is given in Table 
im We name this algorithm sparse weighted MMSE algorithm (S-WMMSE). The following theorem states its 
convergence property. The proof can be found in Appendix Ol 

Theorem 1: The S-WMMSE algorithm converges to a stationary solution of problem (PI). 

We remark that in a MISO network in which each user has single antenna, the algorithm stated in Table H14IT1 
can still be used, except that in this case the receiver Uj fc reduces to a scalar. 



B. Parameter Selection 

In this subsection, we provide guidelines for choosing some key parameters for practical implementation of the 
proposed algorithm. 



TABLE II 
The S-WMMSE Algorithm For Solving Problem (P2) 



51) Initialization Generate a feasible set of variables {v ikl m k , Wi k } 

52) Repeat 



S3) 
S4) 



^r- 



^— 



(E m HLwfAK) H +<i 



d "j fc (^ji 



H 



i k *i k 



, V i k 



Ai-nm^ lk )-\y lk 



dR lk \R lk =R Zk{v) \ 

55) For each k G /C, update {'v qk }q k eQ k using Table U 

56) Until Desired stopping criteria is met 



When w, u are fixed, the procedure in Table H contains two bisection loops for solving for each v 9fc . The outer 
loop searches the optimal (/i 9fc )* G [// 9fc , Jl qk ] that ensures v? fc ((/x 9fc )*) satisfy the complementarity and feasibility 
conditions (fTTl ). The inner loop searches for the optimal 5 qk (fi qk ) G [S_ 9k , 5 ik ] to ensure (12TI ). In implementation, it 
will be useful to have explicit expressions for initial bounds of these variables. 

1) The Choice of Initial [i Qk , ~p, qk : From the fact that fi qk > 0, we can simply set the lower bound as fi Qk = 0. 
For the initial upper bound Jfl k , it is sufficient to guarantee that 



E K fc (^ fc )ll 2 <^ 



Ik- 



(24) 



u-eii- 



To see this, recall that ||v? fe (// 9fc )|| 2 is monotonically decreasing w.r.t. /i 9fc . Consequently when (l24l is satisfied, 
there must exist a (// 9fe )* G [0, ~p qk ] such that both the feasibility and complementarity condition (fTTl ) is satisfied. 
To ensure (1241 . it is sufficient that such ~p qk satisfies the following condition for each active user i^ G A Qk (notice 
that the active set A qk is decided before bisection starts) 



XkSf k 



|vW)| 



For a specific ik G A qk , we have the following inequalities 



Jl qk )I 



Ml 



< 






Jk[q,q] + ( 



\ k 6 qk 



< 



Jk[q,q 




(25) 



where in (a) we have used the fact that (|25T ) is decreasing w.r.t. 5 qk ; in (b) we have used the fact that Jk[q, o] h 0. 



As a result, it is sufficient to find a [i qk such that =|j 
choice ensures (l24l 



< 



for all i*. G „4 9fe . This implies that the following 



Tfl k 



> 



P^ 



max Ci 
ikeAik 



(26) 



2) The Choice of Initial 5 qk , 5 ik : Once the initial bounds on fjfl k are chosen, we can determine the initial bounds 

for each 5 Qk , ik G ^4 9fc . Because 5| fe > 0, the lower bound can be simply set as 5 9k = 0. Next, we will find the 5 ik 
that is suitable for all fi Qk G [/J, qk , ~p Qk ] . From the proof of Lemma [3] we see that it is sufficient to choose the initial 



bound 5 9k such that 

•'k 



c'lk 



hiMl^ qk ) > !> V M % G ^N/^I.V i k G -4 9fc - 
In this way for each //«* G [fi qk ,JI qk ] there is a 6 q *(/J, qk ) G [0,5^] that ensures h ik (df k (n qk ),n gk ] 



(27) 



1. 



We have the following series of inequalities bounding h ik (5 i , n qk ) for any \x qk € \\x qk , yfl k 



*k 



1 || || 




p(Jfc[g,9]) + -V t +7? n 





> $? —^r, llciJI (28) 



where the first inequality is due to the monotonicity of h{ k (<5j * , yfl h ) w.r.t. fi qk . Using (128T ). we can show that the 
following choice is a sufficient condition for 



€ > n n 1 A > (J fe [g,g]) + /in V i k e A« k . 



k I If II Afe 
I **ll 2 



C. Distributed Implementation 

Suppose that there is some central entity, say a macro BS, managing the downlink resource allocation for each 
cell. Then under the following assumptions, the proposed algorithm can be implemented distributedly by each 
macro BS. 

A-l) each macro BS k knows the channels from the BSs in its cell to all the users I; 

A-2) each user has an additional channel to feedback information to its current serving BS; 

A-3) different macro BSs can exchange control information. 
Under these assumptions, in each iteration of the algorithm, a user i\. can measure the covariance of the received 
signal Cj fc and update its weight and receive beamformer Wi k , Uj fc , respectively. It then feeds these variables to one 
of its serving BS, who in turn forwards it to the macro BS. Each pair of macro BSs then exchange their respective 
users' current beamformers. With these pieces of information, all macro BSs can carry out the procedure in Table H 
independently. The newly computed beamforming and clustering decisions are subsequently distributed to the BSs 
in their respective cells via low-latency backhaul links. 

In practice, considering the costs of obtaining and sharing of the channel state information, the sparse clustering 
algorithm may only need to be executed in its full generality in every several transmission time intervals (TTIs). 
During the TTIs in which the clustering is kept fixed, one can either keep updating the beamformers (by solving 
problem (PI) without the regularization term), or even fix the beamformers. 

V. Numerical Results 

In this section, we perform numerical evaluation of our proposed algorithm. We consider a multicell network of up 
to 10 cells. The distance of the centers of two adjacent cells is set to be 2000 meters (see Fig. Q]for an illustration). 
We place the BSs and users randomly in each cell. Let df denote the distance between BS q? and user i^. The 
channel coefficients between user ik and BS qt are modeled as zero mean circularly symmetric complex Gaussian 
vector with (200/d? f ) Lf as variance for both real and imaginary dimensions, where 101ogl0(L*) ~ A/"(0, 64) 
is a real Gaussian random variable modeling the shadowing effect. We fix the environment noise power for all the 
users as of = 1, fix the power budget of each BSs as P qk = P, and fix the number of BSs and the number of 
users in each cell as \Qk\ = Q, \%k\ = I- We define SNR = PQ. 

In Fig. [2 we illustrate the structure of the overlapping clusters generated by the S-WMMSE algorithm in a simple 
single cell network. In this example, when no group-sparsity is considered, each user is served by all the BSs in 
the set Q\. In contrast, when the clusters are formed by performing the proposed algorithm, the cluster sizes are 
significantly reduced. In Fig. |3j we show the averaged number of iterations^] needed for the proposed algorithm to 
reach convergence in different network scenarios. The stopping criteria is set as |/(v t+1 ) — /(v*))| < 10 _1 , where 
/(•) represents the objective value of problem (PI). For both of these results, the sum rate utility is used. 

Our experiments mainly compare the proposed algorithm with the following three algorithms: 

• WMMSE with full intra-cell and limited inter-cell coordination K2M : In this algorithm, the network is modeled 
as a MIMO-IBC where all the BSs Qk in cell k collectively form a giant virtual BS with the transmit power 
pooled together. It is shown in ll24l that this algorithm compares favorably to other popular beamformer 

'Note that the number of iterations refer to the outer iterations specified by the update in Table ITfl 
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Fig. 1. Cell configuration for numerical experiments. 



design algorithms such as the iterative pricing algorithm [17]. In the present paper, it corresponds to the case 
where a single cluster is formed in each cell. This algorithm serves as a performance upperbound (in terms 
of throughput) of the proposed algorithm. 

• ZF beamforming with heuristic BS clustering: In this algorithm, cooperation clusters with fixed sizes are formed 
within each BS, and each cluster performs ZF beamforming. The clusters are formed greedily by choosing an 
initial BS among the unclustered BSs and adding its nearest BSs until reaching the prescribed cluster size. The 
users are assigned to the cluster with the strongest direct channel (in terms of 2-norm). Each cluster serves 
its associated users by a single cell ZF linear beamforming [8] q To ensure feasibility of the per-cluster ZF 
scheme, the weakest users in terms of direct channel are dropped when infeasibility arises. 

• WMMSE with each user served by its nearest BS: In this algorithm, each user is assigned to the nearest BS, 
that is, the size of the coordination cluster is at most 1. We denote this algorithm as WMMSE-nearest neighbor 
(NN). 

We first consider a network with K = 4, I = 40, Q = 20, M = 4, N = 2. We use system sum rate as the utility 
function. The achieved system sum rate and the averaged number of serving BSs are shown in Fig. |4|- Fig. [5] for 
different algorithms. Each point in the figures is an average of 100 runs of the algorithms over randomly generated 
networks. Notice that for the ZF based scheme, although the cluster size is given and fixed, the actual number of 
serving BSs per user can be smaller than the cluster size, as some users may not be served by all the BSs in its 
serving cluster. It can be seen from Fig. |4|- Fig. |5]that the system throughput obtained by the proposed S-WMMSE 
algorithm is close to what is achievable by the full cooperation. Moreover, the high throughput is achieved using 
moderate cluster sizes. Notice that the proposed algorithm compares favorably even with the full per-cell ZF scheme 
(with cluster size 20). This suggests that the inter-cluster interference should be carefully taken into consideration 
when jointly optimizing the BS clusters and beamformers. 

It is important to emphasize that the parameters {\k\k=i in the proposed algorithm balance the sizes of the 
clusters and the system throughput. For different network configurations they need to be properly chosen to yield 



the best tradeoff. Empirically, we found that setting X^ 



QK 



iVSNR 



gives a satisfactory tradeoff (as illustrated 



in Fig. [4]- Fig. [5]). This is partly because choosing A& inversely proportional to vSNR can better balance the 
relative importance of the penalization term and the sum rate term when SNR becomes large. To better select this 
parameter for different system settings, we provide an alternative scheme that adaptively computes {\k\k=\ in 
each iteration of the algorithm. Note that for the quadratic problem (P4), if Afc is chosen large enough such that 

Afc > Afc = 2 x maXq G Q fc> j fcG i fc ||dj fc [g]||, then Vj fc = 0, V i^ £ Zfc. This result can be straightforwardly derived from 

2 Note that in |8|, after the beams are calculated and fixed, the power allocation for different beams/streams are determined by solving a 
convex vector optimization with sum-power constraint. In our simulation, we replace the sum-power constraint with a set of per-group of 
antenna power constraint to better fit the multi-BS setting. 
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Fig. 2. Illustration of coordination status of running the S-WMMSE 
algorithm for a single cell network. A line connecting a BS and a 
user means this BS currently serves this user. The black circles 
indicate the coordination clusters generated by the algorithm. K = 
1, M = 4, N = 2, |Ii| =3, | Si | =5. The sum rate utility is 
used. 




Fig. 3. Comparison of the number of iterations needed for 
convergence with different network sizes. K = {4, 10}, M = 4, 



N = 2, X k 



QK 



fVSNR 



, V k. The sum rate utility is used. 
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Fig. 4. Comparison of the system throughput achieved by different 
algorithms. K = 4, M = 4, JV = 2, |I fc | = 40, |Q fc | = 20, 
the sum rate utility is used. For the S-WMMSE algorithm, Afe = 
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Fig. 5. Comparison of the averaged number of BSs serving each 
user for different algorithms. K = 4, M = 4, TV = 2, |2fe| = 
40, |Q fc | = 20, the sum rate utility is used. For the S-WMMSE 
algorithm, A fc = f K , V k. 



the optimality condition ( fl9l ). For conventional quadratic LASSO problem, the fixed sparsity parameter A/t can be 
chosen as cXk, where < c < 1 is a small number, see, e.g., ll36l . In our experiments, we found that choosing Afc 
as Afe = min {ngjjjrS 1} works well for all network configurations. The performance of the S-WMMSE algorithm 
with this adaptive choices of Afc is also demonstrated in Fig. @]-Fig. [5] Clearly such adaptive choice of Afc can 
generate smaller sizes of the clusters while achieving similar performance as its fixed parameter counterparts. Note 
that the convergence proof for the proposed algorithm does not apply anymore, as it requires that parameters {Afc} 
must be fixed during the iterations (although in simulation experiments we observe that this adaptive algorithm 
usually converges). 

We also consider a larger network with K = 10, I = Q = 20, M = 4 and N = 2, and choose to optimize 
the proportional fairness utility defined as Ui k (Ri k ) = log(Ri k ). In Fig. [6]-Fig. |9l we compare the performance of 
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Fig. 6. Comparison of the system throughput achieved by different 
algorithms. K = 10, M = 4, N = 2, \X k \ = 20, |Q fc | = 20, the 
PF utility is used. Xk is specified in the legend. 
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Fig. 7. Comparison of relative per-BS transmission power used 
(relative to the power consumption of WMMSE algorithm with full 
per-cell cooperation). K = 10, M = 4, N = 2, \X k \ = 20, 12*1 = 
20, PF utility is used. Xk is specified in the legend. 
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Fig. 8. Comparison of the averaged cluster sizes generated by 
different algorithms. K = 10, M = 4, N = 2, \l k \ = 20, \Q k \ = 
20, PF utility is used. Afe is specified in the legend. 



Fig. 9. Comparison of distribution of the users' individual trans- 
mission rates achieved by different algorithms. K — 10, M = 4, 
AT = 2, \I k \ = 20, |Q fc | = 20, PF utility is used. For the S- 
WMMSE algorithm, X k = 0.1. 



the WMMSE algorithm and the WMMSE-NN algorithrro with the proposed algorithm for different choices of A&. 
In order to highlight the role of A& in balancing the system throughput and the cluster sizes, we show in these 
figures the performance of the proposed algorithm with fixed sparsity parameter Afe for all SNR values. In Fig. 
13 we plot the averaged per-BS power consumption relative to that of the WMMSE algorithm. In Fig. [9] we plot 
the distribution of the individual users' rates generated by these algorithms. Clearly the proposed algorithm is able 
to achieve high levels of system throughput and fairness by only using small cluster sizes and significantly lower 
transmission power (the reduction of transmission power can also be attributed to the use of penalization, see Fig. 
[7]). Additionally, in Fig. [6] and Fig. [H we include the performance of a limited cooperation scheme in which each 
cell only coordinates with its nearest neighbor, while treating the signals of the remaining cells as thermal noise 
(this scheme is labeled as "Sparse-Neighbor"). We observed that this scheme has similar system throughput as the 
original one, but results in larger cluster size. Such increase in cluster size can be seen as a compensation adopted 
by the "Sparse-Neighbor" algorithm for ignoring certain inter-cell interference. Due to the limited coordination 
among the cells, the convergence of this scheme is not theoretically guaranteed. However in simulation we found 



We do not consider the ZF scheme in this experiment for the reason that it cannot guarantee that all the users in the system are served 
simultaneously, as required by the solution of the proportional fair utility maximization problem. 



that the algorithm usually converges. 



VI. Concluding Remarks 



In this work, we propose to jointly optimize the BS clustering and downlink linear beamformer in a large scale 
HetNet by solving a nonsmooth utility maximization problem. A key observation that motivates this work is that 
when all the BSs in each cell is viewed as a single virtual BS, the limited coordination strategy that requires a 
few BSs jointly transmit to a user is equivalent to a group-sparsity structure of the virtual BSs' beamformers. We 
effectively incorporate such group-sparsity into our beamformer design by penalizing the system utility function 
using a mixed £2/^1 norm. We derive a useful equivalent reformulation of this nonsmooth utility maximization 
problem, which facilitates the design of an efficient iterative group-LASSO based algorithm. Simulation results 
show that the proposed algorithm is able to select a few serving BSs for each user, while incurring minor loss in 
terms of system throughput and/or user fairness. 

Our framework can be extended for the scenario that multiple streams are transmitted to each user as well. 
In this more general case, a precoding matrix is used by each BS for each user. To induce sparsity, the utility 
function should be penalized by the Frobenius norms of the precoding matrices. All the equivalence results derived 
in Section JII] hold true for this general case, while the algorithm needs to be properly tailored. We also expect that 
the proposed approach can be extended for other related problems such as the design of coordinated transceiver in 
an uplink HetNet, or the design of antenna selection algorithms for large scale distributed antenna systems. 
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Appendix A 
Proof of Proposition [Hand Proposition [2] 
Proof of Proposition [7J Let I(v?* ) denote the (nonsmooth) indicator function for the feasible space of vector 

vf,i-e., 



fe I —00, otherwise. 

We can then rewrite problem (HJ compactly as an unconstrained nonsmooth optimization problem 



1 Rk - Xk E K*ll + E J ( v fe fc ) - m £ x /r( v ) < 29 > 



iV * i keK V q k <£Q k q k <£Q- 



and rewrite © equivalently as 



w k e k -log(w k ) + \ k ^ ||vfj|-^ /(v«) 
keK v qk£Qk qk£Qk 



= min /mse(v,u,w). (30) 

{v**}.{u fc },{»« fc } 

We first claim that the function fn(-) is regular in the sense of Definition [3l under the block structure v = 
{ v fe fc }i?fcGQfc,fcGK:- This claim can be verified by observing that under such block structure, the nonsmooth parts of 
the function /#(■) are separable across blocks, and the smooth part of /#(•) is differentiable. This property ensures 
that a coordinatewise stationary point v of /#(■) is also a stationary point. See ll42l Lemma 3.1] for a derivation. 
Similarly, the function / mse (-) is regular under the following block structure 

v = { v *keS t ,M, u = {ufclfcex;, w = {w k } keK . 
Now assume that (v*,u*, w*) is a stationary solution of problem (|30l l, then we must have 

,C e (v*,u*,w*; ((),••• ,d Uk ,0,--- ,0))>0, Vd Ufc , Vfc (31) 

f^ se (v*,u*,w*;(0,---,d Wk ,0,--- ,0))>0, Vd Wk , Vfc (32) 



where (0, • • • ,d Ufc ,0, ••• ,0) is a vector of zero entries except for the block corresponding to the variable u^, 
which takes the value d Ufc . Condition (OTb implies that u* k is the unconstrained local minimum of the function 
/ mse (v*, [u]f.,u*_ k ],w*). The same is true for Wf.. Notice that / mse (-) is smooth w.r.t. u^ and Wk, consequently, we 
must have 

„ £ I * * *\ n <9/ mso (v*,U*,W*) 

Vujmse V*,U*,W* ) = 0, — ^ ■ '- = 0, V fc G K. 

ow k 
The above two sets of conditions imply that 

u£ = CfcVjHJivJ, 

w* k = ± = (i- Kfintfc-^vnntvi)- 1 . 

e k 

In the sequel we will occasionally use u£(v*) and w^(y*) to emphasize their dependencies on v*. Using these two 
expressions, we have 

/ msc (v*,u*,w*) 
= ^(l-log((e* fc )- 1 ) + A fc £ HvH-EW) 

{ ^K-f R (v*) (33) 

where in (a) we have used the matrix inversion lemma iPPfl to obtain 



logllel.)- 1 ) = log (^1 - (v£,) H (H^C^(v*)H^ 
= log 



= ifc(v*). 
Using Definition [Q we write the stationarity condition of problem (l3Qb w.r.t. each component as 



/L.(v',u , ,w*;(0,--.,d v « fc ,0 > .--,0)) >0, Vd v , t , V ft , V k. (34) 

\ k / k 

Using Danskin's Theorem P31 and the fact that (w*,u*) = arg min / mso ( v* , u, w), we have 

/; se (v*,u*,w*;((V--,d v?J (V- •,())) 

= -/i(v*;(0,...,d v ,,0,..,0)) 1 Vd v «,V ft eQ t , Vfce/C. 

Combining (|34l and the regularity of /r(v) given in Definition [3j we conclude that 



-/R,(v*;d Vfc )>0, Vd Vfc = 



d „i* , • • • , d Qk 



(35) 



According to Definition [Q v* satisfies the stationarity condition for problem (|29l l. The reverse direction can be 
obtained using the same argument. 

The equivalence of the global optimal solutions of the two problems can be argued as follows. Suppose ( v* , u* , w* ) 
is a global optimal solution of min / mse (v, u, w) but v* is not a global optimal solution of max /r(v). Then there 
must exist a v such that /r(v) > /r(v*). Using ([33]>, we must have / mse (v*, u*, w*) > K - /r(v). Notice that 
when plugging v, u*(v), w*(v) into / m sc( - )> we again have 

/ m8e (v,u*(v),w*(v)) = #-/ fl (v). (36) 

Therefore, we have 

/ msc (v*,u*,w*) > / msc (v,u*(v),w*(v)) , (37) 

a contradiction to the global optimality of (v*,u*,w*). This completes the second part of the claim. ■ 

Proof of Proposition^ (sketch) We first show that the function ji k (-) is well defined. From our assumption on 

the utility function Uj.(-), we see that Ui k (— log(e$ k )) is a strictly convex function in ej fc for all ej fc > 0. This 

ensures — u ' k rfe .° g is a strictly decreasing function. Consequently, its inverse function is well defined. Assume 



that (v*,u*,w*) is a stationary solution to problem (P2). Following the steps of the proof in Proposition [Q we 
can show that w* is of the following form 



"',■ 



du ik (i^ 



dlL 



The rest of the proof is the same as that of Proposition [T] We omit it due to space limit. 



'//, 



Appendix B 

We first show a monotonicity property of hi k (5 qk , fi qk ). 

Lemma 2: Suppose IIcjJI > -£-. Then for fixed 5 9k > 0, hi k (5 qk , fi qk ) is a strictly decreasing function of pi 

For fixed fj, Qk > 0, hi k {5 qk , fJ. qk ) is a strictly increasing function of 5 qk . 

x s qk 
Proof: Define B($f* , fi qk ) = 3 k [q, q] + (-^- + n qk )l M . Then we have 

f)h. (S qk ui k ) S qk 

ori tk [Q, k ,p ) f^ ||B _ 1(Jgfc „ k) ,,_! 

;; OTr [B-H^.^^cgB-^^,//^)] 

Notice that 3 k [q, q] h because it is a principal submatrix of a positive semidefinite matrix 3 k . This fact combined 
with 8f k > ensures B(S^,p q ") y and B' 1 (5f k , fi qk ) >- 0. Using the fact that c ik / and B " 
we have 



-i/* 



(S£, »*•)>■ 0, 



This condition ensures 



Tr[V-\8f k ,p qk )B~\8 qk k ,p qk )c lk clK- H (5 qk ^ qk )} 
= cf k B- H (S qk ^ qk )B- 1 (S qk ^ qk )B- 1 (S qk ^ qk )c tk >0. 

< 0, V 5l k > 0, which in turn implies the desired monotonicity. 



d^"k - "' • "l k 

The second part of the lemma can be shown similarly. 
Utilizing Lemma |2j we can show that 5 qk (fi qk ) always exists. 



Lemma 3: Suppose the condition 1 1 Cj J I > 4r is satisfied. Then for any fixed fi qk that satisfies < n Qk < oo, 
there always exists a S qk (fi qk ) that satisfies (|2 lb for all if. S A qk . 

Proof: Pick an i k G A q ". First notice that h ik (0, ^ q " ) = 0, V < /U 9 " < oo. We then show that lim 5 «* ^^ /i ifc (<5^ fc , /i 9fc 
1, V < /j, qk < oo. To this end, we can write 

Jim h lk (S qk p qk ) 



> 



lim 

S" k 



€ 



oo^+A** 



Jfc[?,9] 



2 % 



I'll- 



t-M 



2 



> 1 



where the last inequality is from the assumption. 

Combining Lemma [2] and the fact that hi k (5 qk , fi qk ) is increasing w.r.t. 5f k , we conclude from continuity that 
there must exist < 5 9k ([i qk ) < oo such that (1211) is satisfied. ■ 

The proof of Lemma [3] is constructive, as it ensures that a bisection method can find 5 qk (/i qk ). 



Appendix C 
Proof of Lemma [T] 

Fix a given yfl k > 0, pick i k £ A q " . Suppose <5f*(jU 9 *) satisfies h ik (5 qk (/i 9fc ) , fi qk ) = 1. Fix /2 9fc > /j, qk . From 
Lemma |2] we have h ik (S qk (^ qk ),fi qk ) < 1. To ensure h ik (8 q i k ('j2 qk ),'i2 qk ) = 1, we must have 5 qk (jl qk ) > 5 qk {^ qk ), 
which gives the first part of the claim. 



We prove the second part of the claim by contradiction. First consider the trivial case where no user is active, 

i.e., A q " = 0. Clearly Y,i h ez k \\ v i k (^ 9fc )l| 2 = ° and tne claim is proven. 

Now suppose A Qk is nonempty, that is, A qk > 0. Suppose that there exists an i^ € Ak such that for all fi qk > 0, 



tfM h )< 



This assumption combined with Lemma |2] implies 



However, we have that 




M 9& >h l M k U" k ),^ k ) = l, VAi*>0. 




= 



(38) 



which contradicts (l38l l. This suggests that for each iy. G A Qk , there exists a Jq k such that for all [i qk > [if , J?* (/x 9fc ) > 



f«* V1 """' Wii^W 






). Taking yfi k = max^e^ /Z*\ we have that for all fi qk > /i qk , Y^i k eA q k 



(C(/^)) : 



< 



P qk . This condition implies that J2i ei \\ w T.(l J ' qk )\\ 2 ^ Pqk- As a result, the second part of the claim is proved. 



Appendix D 
Proof of Theorem Q] 

Proof: Due to the equivalence relationship, it is sufficient to show that the S-WMMSE algorithm converges to 
a stationary solution of the problem (P2). 

We first show that the BCD procedure in Table H for updating v converges to the global optimal solution of 
problem (P3). Recall that this problem can be decomposed into K independent convex subproblems of the form 
(P4), then it is sufficient to show that each of these problems are solved globally. Similarly as (|29ll-(l30Tl. problem 
(P4) can be expressed in its unconstrained form 



mm 



E 



v^J.v, 



vf d u - df v n 



ik l 






qk£Qk qk£Qk 

The procedure in Table |I]is a BCD method for solving the above unconstrained problem, where each block is defined 
as v Qk = {v? fc }j fce i fc . Observe that i) the nonsmooth part of the objective is separable across the blocks; ii) the 
smooth part of the objective is differentiable; Hi) each block variable v qk can be solve uniquely when fixing other 
variables {v Pk } Pk ^ qk . According to [42, Theorem 4.1-(c)], these facts are sufficient to guarantee the convergence 
of this BCD procedure to a global optimal solution of the convex nonsmooth problem (P4). 

To prove the convergence of the S-WMMSE algorithm to a stationary solution of problem (P2), we can again 
write problem (P2) into its unconstrained form, and see that the nonsmooth part of the objective is separable 
across the blocks of variables v, u, w. Furthermore, when we fix any two block variables and solve for the third, a 
unique optimal solution can be obtained. Applying H2l Theorem 4.1], we conclude that the S-WMMSE algorithm 
converges to a stationary solution of the problem (P2). ■ 
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